Simulating Bagging without Bootstrapping

نویسندگان

  • Anneleen Van Assche
  • Hendrik Blockeel
چکیده

Bagging is a well-known and widely used ensemble method. It operates by sequentially bootstrapping the data set and invoking a base classifier on these different bootstraps. By learning several models (and combining them), it tends to increase predictive accuracy, while sacrificing efficiency. Due to this it becomes slow for large scale data sets. In this paper we propose a method that simulates bagging. Instead of bootstrapping the data and computing statistics for each attribute on each bootstrap, we only compute each statistic for the original training set, and verify how the statistic would be distributed if it were computed from resampled sets. We then generate the statistic for bootstrapped versions by sampling from that distribution. This procedure has the effect of reducing the computational complexity of refining a node in bagging from O(N*I) to O(N+I), where N is the number of instances and I the number of bagging iterations. An experimental evaluation reveals, however, that due to unfavorable constant factors hidden in those formulas, efficiency gain becomes only really worthwhile when dealing with large data sets. Thus, efficient generation of random values according to the required distribution is crucial in order to truly exploit this improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Observations on Bagging

Abstract: Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. More generally, the resample size M may be different from the original sample size N , and resampling can be done...

متن کامل

Improving the Robustness of Bagging with Reduced Sampling Size

Bagging is a simple and robust classification algorithm in the presence of class label noise. This algorithm builds an ensemble of classifiers by bootstrapping samples with replacement of size equal to the original training set. However, several studies have shown that this choice of sampling size is arbitrary in terms of generalization performance of the ensemble. In this study we discuss how ...

متن کامل

Stratified and Un-stratified Sampling in Data Mining: Bagging

Stratified sampling is often used in opinion polls to reduce standard errors, and it is known as variance reduction technique in sampling theory. The most common approach of resampling method is based on bootstrapping the dataset with replacement. A main purpose of this work is to investigate extensions of the resampling methods in classification problems, specifically we use decision trees, fr...

متن کامل

Bagging Does Not Always Decrease Mean Squared Error

Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. Heuristically, the averaging process should reduce the variance component of the prediction error. This is supported by emp...

متن کامل

Bagging for robust non-linear multivariate calibration of spectroscopy

This paper presents the application of the bagging technique for non-linear regression models to obtain more accurate and robust calibration of spectroscopy. Bagging refers to the combination of multiple models obtained by bootstrap re-sampling with replacement into an ensemble model to reduce prediction errors. It is well suited to “non-robust” models, such as the non-linear calibration method...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006